Skip to content

Manage Grafana Service Accounts from the Grafana CR #1907

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ndk
Copy link

@ndk ndk commented Mar 17, 2025

#1469

See full de# proposal #003.


Summary

This PR introduces declarative management of Grafana Service Accounts (SAs) through the Grafana CR.
The operator now provisions, updates, and cleans up SAs, their API tokens, and the corresponding Kubernetes Secrets, keeping them in sync with the desired state.

Architecture highlights

  • Dedicated reconciler - GrafanaServiceAccountReconciler watches the same Grafana objects as GrafanaReconciler but is scoped to SA concerns only.
  • Condition driven - It executes only after the parent instance reports GrafanaReady == True; otherwise it requeues.
  • Status‑patch strategy - Both reconcilers patch (rather than update) the status sub‑resource to minimise write conflicts.
  • Drift correction - Manual edits or deletions made directly in Grafana are detected and reconciled back to the spec.
  • Security model - Because Grafana's API lacks labels/annotations, reconciliation relies on the operator‑managed status.serviceAccounts list to avoid accidentally taking ownership of user‑managed SAs/tokens.

What's implemented

  • spec.grafanaServiceAccounts field on the Grafana CR.
  • Full create / update / disable / delete life‑cycle for SAs, tokens, and Kubernetes Secrets.
  • Chainsaw‑based e2e suite in tests/e2e/grafanaserviceaccount/chainsaw-test.yaml.
  • Status fields & conditions wired for detailed progress reporting.

TODO

  • Unit‑test coverage for reconciler logic.
  • Support for writing token Secrets into a custom namespace.
  • Automatic token rotation via the expires attribute.
  • User‑facing documentation & sample manifests.
  • Merge the dependent feat_grafana_conditions branch (condition‑based flow) and rebase this PR.
  • Re‑evaluate folding SA reconciliation back into GrafanaReconciler once conditions are in place to avoid dual‑reconciler status clashes.

Known limitations / caveats

  1. Status single source of truth - Reconciliation logic treats status.serviceAccounts as the single ownership ledger.

    • Failure example: Suppose the controller creates an SA successfully but crashes before patching status. On the next run it sees the SA missing from status, issues a new create call, receives a 409 already exists error, and halts.
    • Strict vs permissive adoption: Currently the controller refuses to adopt or overwrite resources it did not create. Relaxing this rule could let it recover gracefully by adopting any "orphaned" SAs/tokens/secrets, boosting resilience. The downside is a larger attack surface—an attacker or mis‑configured script could inject a rogue resource that the operator would silently assume.
  2. Dual‑reconciler vs single‑reconciler design - Keeping GrafanaServiceAccountReconciler separate isolates SA logic and shortens unit‑test feedback loops, but it also:

    • Doubles the number of controllers touching the same CR
    • Increases status‑patch contention
    • Adds another work‑queue
      An alternative is to invoke SA reconciliation inside GrafanaReconciler after the stage loop, gated by GrafanaReady. This would avoid write conflicts at the cost of a fatter reconciler and slightly later SA provisioning.
  3. Default organisation only - All API calls target the default Grafana org; multi‑org support is out of scope for this PR.

  4. Same‑namespace Secrets - Until cross‑namespace write support is added, token Secrets are created in the Grafana namespace.

Previous Discussion Summary

  • Separate CRD vs. embedded spec - Keep service‑account config embedded in the Grafana CR. A standalone CRD would complicate matching one SA to multiple instances and secret handling.
  • Same‑namespace assumption - With the config embedded, service accounts live in the same namespace as their parent Grafana.
  • Secret location & scope - We do want the ability to write the token Secret to a user‑specified namespace; to be added.
  • Permissions sub‑spec - Move to a follow‑up PR; Enterprise licences can be enabled in CI when that time comes.
  • Reconciler placement - Maintainers will migrate the operator from stage‑based logic to Kubernetes‑style Conditions. I'll rebase on that once it lands.
  • Finalizers - Not required now; we won't delete SAs for external Grafana instances, mirroring current behaviour for dashboards/folders.
  • orgId support - Restrict to the default organisation for this PR and revisit later.

@github-actions github-actions bot added the documentation Issues relating to documentation, missing, non-clear etc. label Mar 17, 2025
@weisdd weisdd added the feature this PR introduces a new feature label Mar 19, 2025
Copy link
Collaborator

@Baarsgaard Baarsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the PR! :D
As this is WIP I have mostly commented on style rather than functionality or wording.

Since you diligent with adding logs, I would like to direct you to: https://github.com/grafana/grafana-operator/blob/master/controllers/contactpoint_controller.go#L209
It's one of the few examples of debug level logs in the project by adding the .V(1)
It could be useful to you if you've been wanting to add more logs but have been holding back.
In case you have questions, feel free to leave comments!

@ndk ndk force-pushed the master branch 9 times, most recently from 44516fc to 297df4d Compare March 31, 2025 07:40
@ndk ndk force-pushed the master branch 6 times, most recently from fc35937 to 45a06ca Compare April 7, 2025 09:57
@ndk ndk force-pushed the master branch 7 times, most recently from 4d3109a to 881b3c0 Compare April 28, 2025 06:30
@ndk ndk force-pushed the master branch 4 times, most recently from 75871ab to f96b8a8 Compare May 5, 2025 08:19
@ndk ndk force-pushed the master branch 2 times, most recently from f62728d to c1fc015 Compare May 7, 2025 12:10
@ndk ndk changed the title WIP Add grafana service account feature to Grafana controller WIP Manage Grafana Service Accounts from the Grafana CR May 7, 2025
@ndk ndk changed the title WIP Manage Grafana Service Accounts from the Grafana CR Manage Grafana Service Accounts from the Grafana CR May 7, 2025
@ndk ndk requested a review from Baarsgaard May 7, 2025 13:47
@ndk ndk force-pushed the master branch 3 times, most recently from 7d582d5 to bdbccc4 Compare May 10, 2025 07:26
@theSuess
Copy link
Collaborator

Thank you for your continued investment in this feature - it's very appreciated!

We took some time to answer your questions

  1. Separate CRD vs. embedded spec

If it doesn't singificantly simplify the implementation, we'd prefer for service accounts to be specified as part of the Grafana CR.
Having a seperate resource would mean allowing one SA to be matched to multiple grafana instances which complicates storage & retrieval of secrets

  1. Same-namespace assumption

When keeping the service accounts as part of the Grafana CR, splitting across namespaces is not possible.

  1. Secret location & scope

This is a good point and being able to specify the namespace for the secret is something that we definetly want

  1. Permissions sub-spec

This isn't really a questions but more of a topic for a follow-up PR. For what
it's worth, we can get enterprise licenses in CI if we implement this
functionality.

  1. Placement of the SA reconciler

We discussed this and came to the conclusion that conditions are probably the
cleanest solution. We'll draft up a PR which migrates from the stage based
reconciliation logic to one based on conditions which you can rebase this PR on.
I don't have a timeframe for this but we'll keep you updated.

  1. Finalizers

The only situation in which a finalizer would be applicable would be with
external instances. However, I don't think we need to implement this for now as
I'm in favor of not deleting service accounts for external instances as this
might have unintended consequences. This also keeps consistent with other
resources as we don't delete dashboards/folders etc. when deleting the Grafana
instance.

  1. orgId support

This is something we can explore in a follow up PR. Let's keep it to the default
orgId for now to reduce the scope and make it easier to review.


Again thanks for all the work! If you have more questions or want to chat about this PR, feel free to join our maintainer meeting every Monday at 11:00 CEST. If you prefer async communication, discussing this further in the PR is also always possible :)

conditions:
- lastTransitionTime: "2025-05-14T21:57:17Z"
  message: Grafana reconcile completed
  observedGeneration: 3
  reason: GrafanaReady
  status: "True"
  type: Ready
@ndk ndk force-pushed the master branch 3 times, most recently from 14c215f to f283415 Compare June 2, 2025 07:23
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Issues relating to documentation, missing, non-clear etc. feature this PR introduces a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants